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Considerable attention has been paid, in recent years, to the use of networks in modeling com- 
plex real- world systems. Among the many dynamical processes involving networks, propagation 
processes — in which final state can be obtained by studying the underlying network percolation 
properties — have raised formidable interest. In this paper, we present a bond percolation model 
of multitype networks with an arbitrary joint degree distribution that allows heterogeneity in the 
edge occupation probability. As previously demonstrated, the multitype approach allows many non- 
trivial mixing patterns such as assortativity and clustering between nodes. We derive a number of 
useful statistical properties of multitype networks as well as a general phase transition criterion. We 
also demonstrate that a number of previous models based on probability generating functions are 
special cases of the proposed formalism. We further show that the multitype approach, by natu- 
rally allowing heterogeneity in the bond occupation probability, overcomes some of the correlation 
issues encountered by previous models. We illustrate this point in the context of contact network 
epidemiology. 

PACS numbers: 89.75.Hc,87.23.Ge,05.70.Fh,64.60.ah 



I. INTRODUCTION 

The end of the XXth century has witnessed increasing 
interest among the scientific community for the use of 
complex networks [l|, [H, [H, E| as models for many real- 
world systems, both from empirical and theoretical per- 
spectives. From the empirical point of view, scientists 
have studied real-world networks to highlight universal 
topological properties such as the Small- World effect || , 
highly skewed de gree distributions @, 0, H, @, H3| or as- 



sortative mixing jllj. On the theoretical side, models 



have been developed to describe or explain topological 
properties of networks [H, EH EH EH > to simulate their 
evolution in ti me 11611 a nd the dynamical processes taking 
place on them dMl EI HI El • 

The first models were rather simple: indistinguishable 
nodes joined by randomly placed edges [22|. With in- 
creasing information on real- world networks, more re- 
alistic — and thus more complex — models have been 
proposed taking into account properties such as an arbi- 
trary degree distribution [l3[ , cluste ring [23l . Izj ] , degree 
correlation [25ll. weig hed edges [26|, |27| . directed edges 
0, HI, Hi,[35M Hi HI or mixing patterns [pJH]- Ex- 
cept for a few cases (e.g. bipartite networks [13j), many 
existing models still consider only one type of nodes and 
therefore neglect any information characterizing the dif- 
ferences among the constituents of the simulated system. 
However, especially in social networks, these differences 
(e.g. sex, age, ethnic group) may have significant and 
non-trivial effects on the structure (e.g. assortative mix- 



ing, communities) and on the dynamical property(ies) of 
the networks themselves as well as on the dynamics of 
the phenomena of interest (such as disease propagation) 
throughout the networks [HI, H|- 

In this paper, we present a bond percolation formal- 
ism of multitype networks with an arbitrary joint degree 
distribution where nodes have explicit properties associ- 
ated with the type they belong to. On the one hand, 
the use of multitype networks allows one to reproduce 
mixing am ong nodes such as assortative mixing [25| or 
clustering [23(]. On the other hand, the use of heteroge- 
neous bond occupation probability allows one to take into 
account correlations between the probability of occupa- 
tion of edges and the nature of the nodes they connect. 
When applied to epidemic propagation, we argue that 
this model adequately represents percolation (spreading) 
processes where such correlations are observed (e.g. in- 
fectious diseases whose probability of transmission is cor- 
related with intrinsic physiological and behavioral char- 
acteristics of individuals). 

Our paper is organized as follows. In Sec. UH we in- 
troduce the multitype networks and define several quan- 
tities of interest. The formalism is developed in Sec. IIIII 
where we obtain the occupied degree (and excess degree) 
distributions, the small component sizes, the percolation 
threshold, and the giant and (average) small component 
sizes. We also show that our formalism corresponds to 
a generalization of existing approaches [19 . l25l |37| re- 
ducing to known results in the appropriate limits. This 
theoretical section is validated with a number of numer- 
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FIG. 1: (Color online) Schematic representation of an undi- 
rected multitype network with M — 4, N — 33, wi = 
IU2 = §, W3 = ^- and W4 = ^ where types 1,2,3 and 4 re- 
fer to squares, circles, triangles and diamonds respectively. 
Edges running between nodes are bidirectional and can thus 
be followed in either directions. 



ical simulations and followed by an application to epi- 
demic dynamics in Sec. IIV1 where the previously cal- 
culated quantities are interpreted in an epidemiological 
context. We also take the opportunity to explain how the 
proposed approach can overcome some of the correlation 
issues that should appear in a realistic treatment of epi- 
demic propagation. Our conclusions and final remarks 
are then collected in the last section. 



II. MULTITYPE NETWORKS 

We consider undirected multitype networks [HI, [H[ 
defined as undirected networks composed of N nodes, 
each of which are labeled with one of M possible types. 
Type-i nodes occupy a fraction lOj of the network and 
the connections between nodes are prescribed by the de- 
gree distribution Pi{k\, k-i, . . . , fcjvr) = Pi(k) giving the 
joint probability for a randomly chosen type-i node to 
be connected to fci type-1 nodes, k 2 type-2 nodes, . . . , 
fcjvf type-M nodes. Any mixing patterns between nodes 
such as assortative mixing are incorporated in the model 
via Pi(k). Our networks are considered in the limit of 
large systems (N — > 00) and are totally random in all 
respects other than the joint degree distribution Pi(k) 
|54j |. Therefore, Pi(k) and Wi define a network ensemble 
over which all quantities obtained with our formalism 
are averaged. Fig. [T] shows an example of an undirected 
multitype network. 

We now define zij as the average number of edges leav- 
ing a type-i node to type-j nodes, directly obtained from 
Pi(k) as 

00 00 00 
*i = E • • • E fc * P *(*i' ■ • ■ ' k ^ ^ E k i P ^- (!) 

fei=0 k M =0 k=0 

Even if every edge in our networks is undirected, the pres- 
ence of different types of nodes adds an artificial direc- 



tion to edges. Indeed, one can follow a link from a type-i 
node to a type-j node (noted i — > j) or in the opposite 
direction (j — » i). Since the degree distribution Pj(fc) 
prescribes the number of edges leaving type-« nodes, a 
given edge joining a type-i and a type-j node will be 
considered from two different perspectives. Therefore, to 
guarantee the consistency of the network ensemble, Pi(k) 
and Wi must respect the condition 



wz 



(wzf 



(2) 



where 



wi 
ui2 





w M 



Z21 



Z12 
Z22 



ZMl ZM2 



Z\M 
Z 2 M 

zmm 



when N — > 00. This constraint relies on having as many 
edges of type i — > j as of type j — > i 55]. Note that @ 
implies (^) non-trivial generally overdetermined equa- 
tions. Thus, one must use values for Pi{k) and Wi that 
explicitly fulfill |2j). The case M = 2 is special in that Wi 
can be uniquely determined from Pi(k) 



Wi 



Z21 



Z\2 + Z 2 \ 



W 2 



Zl2 



Z\2 + Z 2 i 



with Tr(w) = 1. 

Having now defined networks where one can identify 
the type of nodes, we are able to apply different proba- 
bilities of occupation according to how edges are followed. 
Thus, instead of having only one edge occupation prob- 
ability T, as in other percolation models [19j . we define 
the bond occupation probability matrix 



Tn T12 
T21 T 2 2 



Tim 
T2M 



Tmi T, 



M2 



T. 



MM 



(3) 



where Ty is the occupation probability of the i — > j 
edges. Note that T does not need to be symmetric and 
the probability of occupation of i — > j edges can vary 
between edges of the same type (i.e. linking the same 
ordered pair ij) as long as those values are independent 
identically distributed (iid) random variables. The value 
of Tij is then simply the mean of their distribution (l9| 
and is totally independent of Tji , which is the mean of a 
different and independent distribution. 

In view of the possible asymmetry of the probability of 
occupation, our approach is somewhat different from the 
traditional bond percolation treatment which assumes a 
symmetric T. It would perhaps be more appropriate 
to refer to our system as a semi- directed bond percola- 
tion. This denomination stems from the following point 
of view: one formally replaces every edge of the original 
undirected network by two directed edges running in op- 
posite directions and then uses the corresponding prob- 
ability of occupation for each directed edges. This leads 
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to a semi-directed network whose percolation properties 
are easier to analyse. Therefore, the introduction of mul- 
titypes together with the tranmissibility matrix allows 
us to cover systems ranging from classical bond percola- 
tion (symmetric T) to spreading processes (asymmetric 
symmetric T) where directionality (e.g. causality) is im- 
plicitly present. On this basis, we develop the multitype 
formalism in the next section. 



III. FORMALISM 

We now present a formalism that describes the het- 
erogeneous bond percolation of multitype networks. It 
is based on probability generating functions (PGF) [391 ] 
and is a generalization to multitype networks of the for- 
malism developed earlier by Newman (l9j . 



A. Occupied Degree Distribution 

The first quantity needed to describe the percolation 
properties is the occupied degree distribution Pi(k), i.e. 
the distribution of the number of occupied edges leaving 
a randomly chosen type-i node. Assuming independence 
in the edges' occupation state, the probability that a ran- 
domly chosen degree-A; node has k occupied edges is 



1 = 1 



P i (k\k) = l[U)(T il )^(l-T il ) 



\ki —hi 



(4) 



The probability that a randomly chosen node has fc oc- 
cupied edges is then simply 



Pi(k) = ]T Pi(k\k)Pi(k) 



k=k 



M 



= E p '( fe )Il (z )(Ta) h (l-T a ) 



\ki —ki 



(5) 



where the summation convention is defined in ([T]) here 
covering the ranges ki < ki < oo for 1 < I < M. 
This probability is generated by the PGF Gi(x;T) = 
Gi(xi, . . . ,x M ;T) 



M 



G l (x;T) = J2^(k)]l x i l 



k=0 1=1 

OO M fej /, x _ _ 

fe=0 i=lfc !=0 V l/ 



M 



= E^( fc )Il[ 1 + ^- 1 )^] 



(6) 



fe=0 



We see that Gj(l; T) = 1 if Pi(k) is properly normalized. 
We can obtain the average occupied degree Zij, i.e. the 



average number of occupied edges leaving a type-i node 
to type- j nodes, by using the differentiation property [l3[ 
of generating functions 

_ _ dGj(l;T) 

Zl3 ~ dx A 



TijJ^kjPiik) 



k=0 

Tij Zij 



(7) 



where Zij is the average degree defined by ([!]). 



B. Occupied Excess Degree Distribution 

Another useful and accessible quantity in our formal- 
ism is the occupied excess degree distribution Qij(k). 
The excess degree is defined as the number of edges 
leaving a node that have been reached by following a 
randomly chosen edge. For undirected unitype networks 
(RI = 1), this quantity is simply the node's degree minus 
one (the edge that has already been followed). More in- 
formation is required for multitype networks; one needs 
to know the type of node at both ends of the followed 
edge to correctly calculate the excess degree. This quan- 
tity is proportional to fcjPj(fe) since high degree nodes 
are more likely to be reached from a randomly chosen 
edge than low degree nodes. Assuming independence in 
the occupation state of edges, the occupied excess degree 
distribution of a type-j node reached from an i — > j edge 
is given by 



1 °° 



k=k 



M 



1 = 1 



Ilk (8) 



where Pj(k + Si) = Pj{ki + Su, . . . , k M + <W) and is 
the delta of Kronecker. Defining Fij(x; T) as the gener- 
ating function associated with this distribution, we have 



M 



F ij (x;T)=J2Qij(k)U x l' 
fc=o 1=1 

= —Y i k i P j (k)l[[l + (x l -l)T Jl \ k - 



ki —S i: 



k=0 



1=1 



which can also be obtained from Gi(x; T) by differentia- 
tion 



F ZJ (x;T) 



1 dGj(x;T) 



(9) 



where Zij is the average occupied degree defined by (J7J). 



C. Small Components Size Distribution 

We now wish to calculate the size distribution of small 
components in the network ensemble. A component is 
any closed set (cluster) of nodes connected by occupied 
edges. The adjective small is meant to qualify any in- 
tensive component (i.e. one that does not scale with the 
network size). Let us first define Hij(x;T) as the func- 
tion generating the size distribution of the component 
reached by following an i — > j edge. Small components 
are typically finite, except at the phase transition where 
their average size diverges [ijj . Thus, we expect the prob- 
ability of finding closed loops in finite components to go 
as 0(N~ 1 ), which is negligible in the large system size 
limit (N — > oo). Small components are therefore treelike 
in structure and H i j(x;'T) can be decomposed in an ad- 
ditive set of contributions as graphically shown at Fig. [5] 
for the case M = 2. The size distribution of a small clus- 
ter reached from an i — ► j edge arises from two situations: 
either the edge reaches a node that has no outgoing occu- 
pied edges (i.e. occupied excess degree = 0), or it reaches 
a node that has outgoing occupied edges (i.e. occupied 
excess degre ^ 0) that lead to other clusters whose size 
distribution is given by H (x; T) as well (56[. 

Noting that the distribution of outgoing edges is given 
by Fij(x; T) and using the power property [13| of gener- 
ating functions leads to the consistency relation 

Hij{x; T) = XjFij (Hj(x; T); T) (10) 

where the right-hand side of the equation must be read as 

F ij (H jl (x; T), . . . , H jM (x; T); T). The solution to 

is found by seeking the stable fixed point of the mapping 

H%\x;T)=x j F ij (H< j n - 1 \x;T);T) 

as n — > oo for initial conditions (x; T) = Xj. Techni- 
cally, the existence of a stable fixed point is guaranteed by 
the presence of the Xj factor in (fTU|) . Indeed, it implies 
that the coefficients in front of variables whose powers 
sum to n (e.g. x^ 1 . . . x k ^ with X^=i = n ) are exact 
after precisely n + 1 iterations. 

Let us next consider a randomly chosen type-i node. 
Each of its leaving edges leads to a component whose 
size distribution is generated by Hij(x;T). Defining 
Ki(x;T) as the function generating the size distribution 
of the whole component, we have 

K i (x;T) = x i G i (H i (x;T);T). (11) 

Since type-i nodes occupy a fraction Wi of the network, 
the size distribution of the component reached from a 
randomly chosen node is generated by 

M 

K(x;T) = J2^K l (x;T) 

8=1 

M 

= ^w i x i G i (H i (x;T);T). (12) 

i=l 



4 

M+ j + ^YWTTT- 
t f • • v v y y*"*"*" 



FIG. 2: (Color online) Illustration of the consistency relation 
(|10[) for M = 2. The boxes represent the component reached 
by an i — > j edge and the circles stand for the type-j node first 
reached. The color of the edges refer to the type of the node 
from which one has arrived and the color of the box/circle 
stands for the type of the node reached first. 



Similar equations to (fT0|) and (fTT|) have already been de- 
rived in [251 ] to obtain the size distribution of the small 
component reached from a randomly chosen type-i node. 
However, the PGFs used there were functions of only one 
variable instead of M (i.e. x instead of x) and therefore 
did not generate the composition of the small compo- 
nent (i.e. the number of nodes of each type). Thus, (flQ]) 
and (jlip are generalized versions of previoulsy derived 
expressions. 

D. Percolation Threshold 

Percolation is usually characterized by the divergence 
of the correlation length. This translates here in the 
divergence of the average size (s) of small components. 
Using the moments property [1 31 ] of PGFs, the average 
number of type-i nodes in the small component reached 
from a randomly chosen node is obtained by differentiat- 
ing (fT2"j) with respect to Xi 

M M 

(Si) = W, + ^ W l X! * l 3 a< lj ( 13 ) 
1=1 3 '=1 

where aft = dHl3 d ^ c ' T ' > are the solutions of 

i la;— 1 

M 

<$ = *«+E r *4 B)a i2- (14) 

n=l 

We have isolated in (jT4j) the average number of type-n 
nodes that can be reached from a type-j node arrived at 
by following a I — * j edge (average excess degree) 

(n) _ 1 dF tj (x;T) 

Plj ~ T dx 1 j 

x jn ux n aj=1 

which only depends on the network structure (i.e. its 
degree distribution) and is therefore known. Thus, to 
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obtain (sj), one simply has to solve (fl4|) . M sets of M 2 
equations and M 2 unknowns. It can be shown that all 

ay are inversely proportional to det (I — A) where I is 
the identity matrix and A is a MxM block matrix whose 
blocks (Ay) are themselves MxM matrices with 



— 7>. .fl(j)jf. 



(16) 



giving the (fi, v) element of the (i, j) block. For example, 
in the case M — 2, A takes the form 



A = 













Tup® 











(17) 



From (H31), we see that the average size (s) of the com- 
ponent reached from a randomly chosen node diverges 
as 

M 



(«> = £<*> 



1 



oc 



det (I - A) 



(18) 



for det (I — A) — > 0. Therefore the phase transition hap- 
pens when det (I — A) = which marks the point where 
the giant component first appears. This result is in ac- 
cord with the corresponding expression found in [25[ and, 
as noted earlier, is again more general. 



E. Giant Component 

Beyond the percolation threshold, there is an exten- 
sive cluster (the giant component) in the network. In 
Sec. IIIICl Hij(x;T) has been defined as generating the 
size distribution of finite components. Thus, Hij(x;T), 
Ki(x;T) and K(x;T) are no longer normalized beyond 
the percolation threshold since it is not guaranteed that 
a randomly chosen edge/node will lead to a finite compo- 
nent (although a fraction of them may lead to the giant 
component). Therefore, the probability that a randomly 
chosen type-« node leads to the giant component is simply 



Vi = 1-^(1; T) 
= l-G i (~h i ;T) 



(19) 



We have noted 



with Gi(hi;T) = Gi(ha,...,hiM;' 
hij = Hij(l;T) (read hij forward) as the probability 
that a randomly chosen i —> j edge leads to a finite com- 
ponent, and is the solution of 

^ = i^(h*;T) (20) 

obtained by evaluating (|10p at x = 1. If one randomly 
chooses a node in the network, the probability that it 
leads to the giant component is therefore 



M 



(21) 



To calculate the size of the giant component, one needs 
to know the probability that a randomly chosen node is 
not linked to the giant component by any of its edges 
(i.e. that this node can not be reached from the giant 
component). One simple way to obtain this information 
is to study the network topology by following every edges 
backwards. This can be achieved with our formalism by 
simply using T T (the transpose of T) instead of T since 
any given type-i node can be left (reached) by any of its 

edges with the probability (Tji). We define hij (read 
hij backward) as the probability that a given type-i node 
can not be reached from the giant component by a j — * i 
edge. This quantity is calculated as solution of 



hij Fij (hj , T 



(22) 



Therefore, we see that Gi(hi \ T T ) is the probability that 
a randomly chosen type-i node does not belong to the 
giant component. The fraction of the network occupied 
by type-i nodes that are in the giant component is thus 
given by 



S, =w l [l-G l (h i - r T T ) 
and the size of the giant component is 



M 



hiiT 1 



(23) 



(24) 



i=i 



In comparing (|2ip and (|24|) , one will see that asymmetry 
of the bond occupation probability matrix implies that 
V ^ S. This quantitative difference between V and S re- 
sides in the asymmetry in the number of occupied edges 
of type i — > j and of type j — > i. A naive generalization 
of the formalism introduced in (l9| would have missed 
the distinction between V and S. Clearly for symmetric 
transmissibility T = T T , one would have V = S. A sim- 
ilar result has previously been discussed in [3l| for semi- 
directed networks and, with different approaches, it has 
been obtained for undirected networks in [36l [5l| . The 
present demonstration is a new extension to the latter 
class of networks. 



F. Average Small Components Size 

Above the percolation threshold, K(x\ T) still gener- 
ates the size distribution of the finite component reached 
from a randomly chosen node, although it needs to be 
normalized according to 

KjxjT) 
1-V 

since P / 0. The general expression for (sj) is therefore 
w = TT~p + Y^V ^ l ^ z ii h 3i a i/ ( 25 ) 
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where is the average number of type-i nodes in the 
finite component reached by a randomly chosen I — » j 
edge and is the solution of 



= F lj (h j ;T)6 i 



M 

E 

n=l 



TjnP, 



(n) (i) 
0" "j"' 



Analogously to (fT5|) , 

g(n) _ 



1 dF l3 (x;T) 



T 

- 1 - t', 



dx r 



(26) 



(27) 



x—h-j 



One can see that ([2"5 ]) - (|2?|) reduce to dl3|)— ((151) in ab- 
sence of the giant component since in this case V = 
and hij — 1. Technically, (|2l)]) - (f2"T|) can be very useful 
to obtain information on the small components without 
having to solve (| 10 |) — (p~2j) . a very time consuming opera- 
tion for large M or for networks with large small compo- 
nents. It is also possible to calculate the second moments 
ofK(x;T) 



(SiSj) 



1 



1 - V dx. 



dK(x;T) 



dx. 



(28) 



from which the covariance matrix of the small compo- 
nents size (cov{si, Sj} = (siSj) — (s^ (sj) ) is obtained. 
Clearly, iterative equations for (sjSj), similar to (|25[) and 
(|26|) . can be derived to calculate the covariance matrix 
without solving Higher moments can also be 

obtained in a similar way. 



G. Special Cases 

We now show that, in corresponding limit cases, our 
formalism reproduces the already published theoretical 
results. Firstly, one can easily verify that all of the e qua- 
tions in the previous section reduce to the ones in |19j 
when M = 1. Secondly, equations associated with the 
components (small or giant) in ,25j can be obtained by 
setting Tij = 1 V i,j in our equations and x — x in (jTUJ) 
and (fTTj) . Thirdly, results obtained from a semi-directed 
formalism such as the one in [3l[ can also be obtained 
with our formalism by setting Tij — for some ij pairs 
while keeping Tji ^ 0. Fourthly, for bipartite networks 
(M = 2), all edges are connecting different types of nodes 
and the constraint 

Pi(fci,fe)=0 V h^O 
must be imposed, implying that 



0; #=0; 0. 



(i) 
ji 



0; 



a 



0. 



Fa(xi,X2',T) and Hu(xi, xi\, T) arc then undefined. 
From (fl6|) . we see that the phase transition in this case, 
dct (I — A) =0, occurs when 

T'i2T , 2i/3 1 2' /?2i = 1 , 



a result previously obtained in [13, [33] ■ Moreover, one 
can obtain the average number of type-1 nodes in the 
component reached from a randomly chosen type-1 node 
(si)-, under the percolation threshold by differentiating 
(fTTj) with respect to Xi and solving (|26|) 



1 - T 12 T 21 /3«^ ' 



a result also obtained by the authors of [T^, [13] ■ Fur- 
thermore, it is possible to calculate the size of the giant 
component as in [33] by setting xi = \ in (fTOf and (fTT|) 
with the constraints listed above. 

An even more general constraint Pj(fe) = V hi ^ 
can be used to obtain a formalism for multipartite net- 
works. Our approach can therefore incorporate clustering 
effects by assigning some of the node types to groups and 
then using the projected network (where nodes belong- 
ing to the same group are linked together) as proposed 
in El. 



IV. APPLICATION TO EPIDEMIOLOGY 



Over the years, mathematical models |41|,|42j have pro- 
vided insights on the factors influencing diseases prop- 
agation dynamics and have improved testing interven- 
tion/prevention strate gies . Outbreaks of respiratory 
pathogens (e.g. SARS [4j]) and sexually transmitted in- 
fections (STIs) have encouraged the emergence of mod- 
els using network theory to capture the patterns of po- 
tential disease-causing interactions between individuals 
[HJ US [U El]- Despite the many successes, most of these 
models are still based on a simplifying assumption, which 
limits the realistic simulation of disease propagation for 
certain categories of diseases. Before discussing how the 
quantities obtained from our formalism can be translated 
in an epidemiological setting, we briefly state some of the 
difficulties associated with a realistic epidemic dynamics 
and the possible advantage of a multitype description. 



A. Failure of the iid hypothesis and heterogeneity 

The iid hypothesis [l9j assumes that the probability 
of transmission between any pair of individuals is an in- 
dependent identically distributed random variable taken 
from a given distribution. Thus, the a priori probabil- 
ity of transmission, T, between any two individuals is 
the mean of this distribution and, in the population as a 
whole, the disease will propagate from an infectious in- 
dividual to a susceptible one with the same probability 
T. This implies that no correlations, whatsoever, can be 
taken into account. 

However, the probability of transmission of infectious 
diseases is typically dependent on intrinsic immunologi- 
cal and behavioral traits of individuals. Many infectious 
diseases show heterogeneity in their transmissibility. For 
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example, the human immunodeficiency virus (HIV) has 
a higher efficiency of transmission from male to female 
than female to male [SB], [4(|. There is also strong evi- 
dences that co-infection with other STIs could facilitate 
HIV transmission [4?], [48|, |49[. In regards to influenza, 
it has been shown that children (under 15 years old) are 
more likely to transmit the disease than adults [50] . Fur- 
ther, it has been shown that the iid hypothesis fails to 
adequately model susceptible-infectious-recovered (SIR) 
dynamics when the distribution of the infectious period 
P(t) is not sharp around a given value t [5l|,[52j]. There- 
fore, most existing percolation approaches fail to realisti- 
cally simulate the propagation of some infectious diseases 
due to the inability of the iid hypothesis to model the 
correlations between individual's traits (including their 
infectious period) and the probability of transmission. 

If one could identify specific individuals within the net- 
work, one could determine who infects whom and it would 
become possible to apply the appropriate probability of 
transmission. Hence, difficulties raised by the hetero- 
geneity in transmissibility could be largely overcome by 
considering node heterogeneity. This suggests that, in 
order to properly model the propagation of a large class 
of diseases, one could separate the nodes into a suffi- 
cient number of categories (types) to insure that the iid 
hypothesis can be applied correctly. Our multitype for- 
malism can then be used to investigate the percolation 
properties of the corresponding system. 

Confronted with a situation where the infectious pe- 
riod is broadly distributed and heterogeneity is present, 
one could adopt the following line of action. In the case of 
influenza, one could split the population between adults 
and children; or between male and female when modeling 
HIV propagation. The probability of transmission could 
still vary according to the iid hypothesis, within the same 
type of edges, if nodes are separated into a sufficient num- 
ber of groups, within each of which all significant corre- 
lations are explicitly included. The finite width of the in- 
fectious period distribution P(t) can be accounted for by 
simply dividing its contribution into a sufficient number 
of duration subdomains [Ti_i,Tj] (each associated with 
a node type randomly distributed in the population if 
more detailed information is not available) and using the 
corresponding transmissibility in our model. The frac- 
tion of the network occupied by type-i nodes will then 
be in, = Q_ P(r)dT. The same procedure is also appli- 
cable if the susceptibility of individuals is heterogeneous. 



B. Epidemiological quantities 

We now interpret the quantities that can be calcu- 
lated with our formalism in an epidemiological context. 
The contact network topology is prescribed by Pj(fe) and 
Wi while the bond occupation probability matrix entries, 
T{j , are the average probability of transmission from in- 
fectious individuals of type i to susceptible individuals of 
type j. Ki(x; T) generates the outbreak size distribution 



caused by patient zero (i.e. the first known individual 
to become infected who directly or indirectly causes all 
subsequent infections) of type-i {e.g. adult, child; male, 
female). Similarly, K(x;T) generates the outbreak size 
distribution caused by a patient zero of any type. The 
quantity (sj) is the average number of type-i individu- 
als infected from patient zero and (s) — X)f=i ( s i) 1S ^ ne 
expected size of an outbreak. One could also differenti- 
ate (jlip with respect to Xj to obtain the average number 
of type-j individuals infected by a patient zero of type-i 
(see Sec. IIII G|) . Those quantities can be useful, for ex- 
ample, in evaluating the impact of strategies focused on 
the reduction of morbidity in specific population groups 
(e.g. health care workers, the elderly). 

For a given contact network, det (I — A) is a polyno- 
mial in powers of the elements of T whose coefficients 
depend only of the network topology. Thus, the percola- 
tion threshold, det (I — A) = 0, defines the critical trans- 
missibility set over which there is a non-zero probability 
that an outbreak will turn into a large-scale epidemic. 
The probability that such an epidemic will occur is given 
by V and by V\ if patient zero is known to be of type-i. 
Should an epidemic occur, the fraction of the population 
that will eventually be infected is given by S while Si 
indicates the fraction of the population of infected indi- 
viduals of type i. Note that if an outbreak dies out while 
having infected only a finite number of individuals (or a 
small number compared to the size of the population), 
the expected number of infected individuals of type i is 
still given by (s,) computed with ([25]) (this remark holds 
for (s) as well). 



C. Numerical Simulations 

To illustrate how our formalism could be applied in an 
epidemiological context and to confirm its predictions, 
we have performed extensive computer simulations on 
multitype networks of TV = 10 5 nodes divided into two 
types (M = 2). We have considered a contact network 
where the distribution of the total degree (ki + k 2 ) of 
individuals is given by a power-law with an exponential 
cutoff and where the probability that an edge leaving 
a type-i node arrives on a type-j node is given by pij. 
Thus, the joint degree distribution of our network is 



Pi(ki,k 2 ) = 



(ki +/c 2 )-^ e -( fel+fe2 )/ K > 



Li Vi (e-V*) 



kx + k 

h 



2\ fci k 2 

Pa Pi2 



with the parameters 





' 1 " 




' 8 " 




'0.7 0.3" 


V = 


2 


; « = 


10 


; p = 


0.4 0.6 



Li^ (z) denotes the 77th polylogarithm of z [53] (also 
known as Jonquiere's function). We have used a simple 
joint degree distribution to illustrate our point; nonethe- 
less our formalism is very general and Pi(k) could include 
many non-trivial correlations as shown in [25J. To show 
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FIG. 3: (Color online) Size distribution of small components 
obtained by numerical simulations (symbols) compared with 
the theoretical prediction of (|12p (lines). A and <£> corre- 
spond to the number of type-1 nodes in small components 
(generated by K(xi,l;T)) for 7 = 0.1 and 7 = 0.5 respec- 
tively. O an d C are the equivalent quantities but for type-2 
nodes (generated by K(l, xi\ T)) . 



the effect of the asymmetry of T on V and <S, we have 
used the following transmissibility matrix 

[ 0.95 0.98" 
7 [ 0.48 1.00 

where 7 allows us to vary the infectiousness of the disease. 
The specific choice of the elements of T has no particular 
relevance here, except perhaps to result in large V and S 
values for 7=1. By solving det (I — A) = for 7, we find 
that the epidemic transition occurs when 7 C ~ 0.1834. 

We have generated 2000 multitype networks follow- 
ing a method similar to the one described in [25| . with 
the degree distribution presented above. We have then 
performed epidemic simulations by infecting a randomly 
chosen node and allowing the disease to propagate with 
probabilities given by T. Above the percolation thresh- 
old, we have identified the components (small or giant) by 
setting a size-parameter, a percentage of the total num- 
ber of nodes, below which the cluster was registered to 
belong to the set of small components. Experimentation 
has shown the final results rather insensitive to the exact 
value of the size-parameter and we have settled conser- 
vatively for a value of 0.5% of TV. Figure [3] compares the 
distribution of the number of infected nodes of each type 
caused by an outbreak predicted by (fT2"|) . with the re- 
sults of numerical simulations under (7 = 0.1) and above 
(7 = 0.5) the epidemic threshold. One observes a very 
good agreement between the theoretical prediction and 
the simulations. This quantitative accord (in this figure 
and the following ones) is representative of a much larger 
set of calculations carried out with different values of the 
transmissibility matrix elements. Figure U shows the av- 
erage number of infected nodes in an outbreak for each 
type of node and for different values of 7. Theoretical 
predictions are obtained from (f2"5")) . Again, an excellent 
agreement between our model predictions and numerical 
simulations is recorded; the small disagreement around 



the percolation threshold is caused by the finite size of 
the networks used for the simulations. Indeed as N de- 
creases, finite size effects become important and the for- 
malism would have to be modified along the lines de- 
scribed in [52], for instance. Preliminary results indicate 
however that agreement between results of the present 
formalism and numerical simulations is maintained, even 
if the size of the network is reduced to N = 1000. A more 
extensive study of the issue of finite size in a multitype 
network is under investigation. 

Finally, Fig. [5] compares the values predicted by our 
model for the probability of an epidemic to occur (V) 
and its relative size (<Si, S2 and S) with simulation re- 
sults for different values of 7. Again, there is a very good 
agreement between the theoretical predictions and results 
from simulations. The asymmetry of T is responsible for 
the significant difference between V and S (up to approxi- 
mately 10% in this case). We also see that the presence of 
node types allows more detailed information on the final 
state of an epidemic since it is then possible to determine 
the number of individuals of each type that are infected 
during an epidemic. Moreover, Fig. [5] demonstrates that 
these numbers do not remain proportional for varying 
transmissibilities. To the best of our knowledge, such in- 
formation was not possible to obtain in previous perco- 
lation models. Furthermore, the heterogeneity in trans- 
missibility in our formalism allows one to test more spe- 
cific public-health policies. For instance, one could study 
the effectiveness of age-specific influenza control strate- 
gies such as vaccination, face masks or hand-washing by 
varying the transmissibility matrix entries for the rele- 
vant age groups. Therefore, the multitype approach of 
our model offers more detailed information on outbreak 
outcomes. This is very useful when comparing the cost- 
effectiveness of prevention or intervention strategies. 



V. CONCLUSION 

In this paper, we have introduced a bond percolation 
formalism on multitype networks. The formalism explic- 
itly allows heterogeneity in the edge occupation proba- 
bility via the matrix T whose elements TJy are the proba- 
bility for an i — > j edge to be occupied. Using probability 
generating functions (PGF), we have obtained several ex- 
act forms of classical statistical properties (in the limit 
of large networks) such as, the size distribution of small 
components, the probability of reaching the giant com- 
ponent from any node as well as its relative size. Fur- 
thermore, the presence of node types has allowed us to 
obtain more detailed information on the composition of 
small components, the giant component, and a general 
expression for the percolation threshold. We have also 
obtained iterative equations for the average number of 
type-i nodes in the small component, which allows to eas- 
ily and rapidly obtain information on the network struc- 
ture. 

We have also shown that our model is a generalized 
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FIG. 4: (Color online) Average number of nodes in small 
components as predicted by (|25[) (lines) compared to simula- 
tions results (symbols; □: type-1 nodes, 0: type-2 nodes and 
A: both) as a function of 7. The vertical dashed black line 
indicates the percolation threshold (7 C ). 




FIG. 5: (Color online) Probability of reaching the giant com- 
ponent from a randomly chosen node ("P, 0)i fraction of the 
network occupied by type-1 nodes (Si, □ ), type-2 nodes (52, 
0) and both node types (S, A) in the giant component. Lines 
stand for theoretical predictions and symbols for simulation 
results. The vertical dashed black line indicates the percola- 
tion threshold (7c)- 



our formalism (with 3M variables, say x, y, z, for the 
3 ways to move across the network, following the links 
forward, backward and in both directions) is straight- 
forward to derive. This extended formalism would be 
required when the underlying network includes directed 
edges whose presence can not be randomly determined 
with a probability that solely depends on the edge 
type (here i — > j), i.e. , additional correlations exist. 

These structural properties have considerable influence 
on the dynamical processes taking place on networks; 
this, in turn, can have a significant impact on their topol- 
ogy. Therefore, a formalism such as the one presented in 
this paper can be used to probe and characterize the 
structure (by setting T^- = 1 V i,j = 1,...,M) of an 
evolving network at a given time in order to predict the 
network's topological evolution. 

The approach described in this paper, when compared 
to previous methods, facilitates more realistic simulations 
of the propagation of infectious diseases manifesting het- 
erogeneity in their transmissibility. We argue that het- 
erogeneity in nodes is a way to overcome some correlation 
issues caused by heterogeneous transmissibility. In addi- 
tion, the presence of different types of nodes allows the 
simulation of many non-trivial mixing patterns observed 
in real-world networks, such as assortativity, the prefer- 
ential connection between different types of nodes; and 
clustering, the fact that nodes belonging to a specific 
group are more likely to be connected to one another 
in the contact network. Thus, the proposed model is 
suitable for more detailed and more precise epidemiolog- 
ical investigations (e.g. impact of intervention or preven- 
tion strategies on specific population groups) resulting 
in more adapted and effective recommendations to pub- 
lic health authorities. Hopefully, models such as the one 
presented in this paper joined with ever increasing theo- 
retical developments will contribute to the improvement 
of public health policies. 



version of various existing approaches based on the PGF. 
Many known results and effects can be obtained with our 
model. For instance, equations describing the bond per- 
colation of multipartite networks can easily be derived 
from our formalism. While semi-directed networks have 
been previously used to simulate the asymmetry between 
population groups infecting each other [3l[, this effect 
can be achieved with our undirected network model by 
setting Tij = for some ij pairs while keeping Tji ^ 0. 
Thus, type-j nodes will be able to infect type-i nodes, 
while transmission in the other direction will not be pos- 
sible. A completely general semi-directed extension of 
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